Introduction

Why are we doing this?

Music. Experts have been trying for a long time to understand sound and what differenciates one song from another. How to visualize sound. What makes a tone different from another.

In this notebook we will go through an in depth analysis of sound and how we can visualize, classify and ultimately understand it.

Purpose

1) We want to understand what is an Audio file. What features we can visualize on this kind of data.

2) EDA. ALways good, here very necessary.

3) A recommender system: given a song, give me top X songs most similar.

import pandas as pd
import numpy as np
import seaborn as sns
import sklearn
import os

import librosa
import librosa.display
import IPython.display as ipd
import matplotlib.pyplot as plt

from sklearn import preprocessing
from sklearn.decomposition import PCA
import IPython.display as ipd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import preprocessing

%matplotlib inline 

import warnings
warnings.filterwarnings('ignore')

Download the data

Audio Source - http://marsyas.info/downloads/datasets.html alt1

CSV Source - source alt1 alt2

!wget http://opihi.cs.uvic.ca/sound/genres.tar.gz
!tar -xvf genres.tar.gz
!wget https://raw.githubusercontent.com/kcirerick/deep-music/master/features_30_sec.csv
df = '/content'
print(list(os.listdir(f'{df}/genres/')))
['me.mf', 'classical', 'bl.mf', 'co.mf', 'country', 'ro.mf', 'disco', 'pop', 'hiphop', 'jazz', 'metal', 'cl.mf', 'input.mf', 'po.mf', 'blues', 'bextract_single.mf', 'hi.mf', 'di.mf', 'rock', 'ja.mf', 'reggae', 're.mf']

Explore Audio Data

Sound: Seqeuence of vibrations in pressure strengths (y)

The sample rate (sr) number of samples of audio carried per second, measured hz or khz

#
y, sr = librosa.load(f'{df}/genres/pop/pop.00034.wav')

print('y:', y, '\n')
y: [-0.401886   -0.44421387 -0.40725708 ... -0.3289795  -0.39907837
 -0.5903015 ] 

print('y shape:', np.shape(y), '\n')
print('Sample Rate (KHz):', sr, '\n')
y shape: (661504,) 

Sample Rate (KHz): 22050 

print('Check Len of the Audio:', 661794/22050)
Check Len of the Audio: 30.013333333333332
audio_file, _ = librosa.effects.trim(y)

# the result is an numpy ndarray
print('Audio File:', audio_file, '\n')
print('Audio File shape:', np.shape(audio_file))
Audio File: [-0.401886   -0.44421387 -0.40725708 ... -0.3289795  -0.39907837
 -0.5903015 ] 

Audio File shape: (661504,)

2D representation: sound wave

plt.figure(figsize = (16, 6))
librosa.display.waveplot(y = audio_file, sr = sr, color = "#A300F9");
plt.title("Sound Waves in Rock 41", fontsize = 25);

Fourier Transform

1) Function that gets a signal in the time domain as input, and outputs its decomposition into frequencies

2) Transform both the y-axis (frequency) to log scale, and the “color” axis (amplitude) to Decibels, which is approx. the log scale of amplitudes.

n_fft = 2048 # FFT window size
hop_length = 512 # number audio of frames between STFT columns (looks like a good default)

# Short-time Fourier transform (STFT)
D = np.abs(librosa.stft(audio_file, n_fft = n_fft, hop_length = hop_length))

print('Shape of D object:', np.shape(D))
Shape of D object: (1025, 1293)
plt.figure(figsize = (16, 6))
plt.plot(D);

#Transform both the y-axis (frequency) to log scale, and the “color” axis (amplitude) to Decibels, which is approx. the log scale of amplitudes.

Spectogram

  • What is a spectrogram?

-> A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams (wiki).

  • Here we convert the frequency axis to a logarithmic one.
# Convert an amplitude spectrogram to Decibels-scaled spectrogram.
DB = librosa.amplitude_to_db(D, ref = np.max)

# Creating the Spectogram
plt.figure(figsize = (16, 6))
librosa.display.specshow(DB, sr = sr, hop_length = hop_length, x_axis = 'time', y_axis = 'log', cmap = 'cool')
plt.colorbar();
plt.title("pop 34", fontsize = 25);

Mel Spectrogram

  • The Mel Scale, mathematically speaking, is the result of some non-linear transformation of the frequency scale. The Mel Spectrogram is a normal Spectrogram, but with a Mel Scale on the y axis.
y, sr = librosa.load(f'{df}/genres/metal/metal.00036.wav')
y, _ = librosa.effects.trim(y)


S = librosa.feature.melspectrogram(y, sr=sr)
S_DB = librosa.amplitude_to_db(S, ref=np.max)
plt.figure(figsize = (16, 6))
librosa.display.specshow(S_DB, sr=sr, hop_length=hop_length, x_axis = 'time', y_axis = 'log',
                        cmap = 'cool');
plt.colorbar();
plt.title("Metal Mel Spectrogram", fontsize = 23);
y, sr = librosa.load(f'{df}/genres/classical/classical.00036.wav')
y, _ = librosa.effects.trim(y)


S = librosa.feature.melspectrogram(y, sr=sr)
S_DB = librosa.amplitude_to_db(S, ref=np.max)
plt.figure(figsize = (16, 6))
librosa.display.specshow(S_DB, sr=sr, hop_length=hop_length, x_axis = 'time', y_axis = 'log',
                        cmap = 'cool');
plt.colorbar();
plt.title("Classical Mel Spectrogram", fontsize = 23);

Audio features

Zero Crossing Rate

  • the rate at which the signal changes from positive to negative or back.
zero_crossings = librosa.zero_crossings(audio_file, pad=False)
print(sum(zero_crossings))
86782

Harmonics and Perceptrual

  • Harmonics are characteristichs that human years can't distinguish (represents the sound color)
  • Perceptrual understanding shock wave represents the sound rhythm and emotion
y_harm, y_perc = librosa.effects.hpss(audio_file)

plt.figure(figsize = (16, 6))
plt.plot(y_harm, color = '#A300F9');
plt.plot(y_perc, color = '#FFB100');

Tempo BMP (beats per minute)

Dynamic programming beat tracker.

tempo, _ = librosa.beat.beat_track(y, sr = sr)
tempo
107.666015625

Spectral Centroid

  • indicates where the "centre of mass" for a sound is located and is calculated as the weighted mean of the frequencies present in the sound.
spectral_centroids = librosa.feature.spectral_centroid(audio_file, sr=sr)[0]

# Shape is a vector
print('Centroids:', spectral_centroids, '\n')
print('Shape of Spectral Centroids:', spectral_centroids.shape, '\n')

# Computing the time variable for visualization
frames = range(len(spectral_centroids))

# Converts frame counts to time (seconds)
t = librosa.frames_to_time(frames)

print('frames:', frames, '\n')
print('t:', t)

# Function that normalizes the Sound Data
def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)
Centroids: [1785.73476425 1737.88416355 1591.71261131 ... 2868.61364503 3105.07750757
 3046.069826  ] 

Shape of Spectral Centroids: (1293,) 

frames: range(0, 1293) 

t: [0.00000000e+00 2.32199546e-02 4.64399093e-02 ... 2.99537415e+01
 2.99769615e+01 3.00001814e+01]
plt.figure(figsize = (16, 6))
librosa.display.waveplot(audio_file, sr=sr, alpha=0.4, color = '#A300F9');
plt.plot(t, normalize(spectral_centroids), color='#FFB100');

Spectral Rolloff

  • is a measure of the shape of the signal. It represents the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies
spectral_rolloff = librosa.feature.spectral_rolloff(audio_file, sr=sr)[0]

# The plot
plt.figure(figsize = (16, 6))
librosa.display.waveplot(audio_file, sr=sr, alpha=0.4, color = '#A300F9');
plt.plot(t, normalize(spectral_rolloff), color='#FFB100');

Mel-Frequency Cepstral Coefficients

  • The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice
mfccs = librosa.feature.mfcc(audio_file, sr=sr)
print('mfccs shape:', mfccs.shape)

#Displaying  the MFCCs:
plt.figure(figsize = (16, 6))
librosa.display.specshow(mfccs, sr=sr, x_axis='time', cmap = 'cool');
mfccs shape: (20, 1293)

Data needs to be scaled:

mfccs = sklearn.preprocessing.scale(mfccs, axis=1)
print('Mean:', mfccs.mean(), '\n')
print('Var:', mfccs.var())

plt.figure(figsize = (16, 6))
librosa.display.specshow(mfccs, sr=sr, x_axis='time', cmap = 'cool');
Mean: -2.9502685e-09 

Var: 0.99999994

Chroma Frequencies

  • Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.
hop_length = 5000

# Chromogram
chromagram = librosa.feature.chroma_stft(audio_file, sr=sr, hop_length=hop_length)
print('Chromogram shape:', chromagram.shape)

plt.figure(figsize=(16, 6))
librosa.display.specshow(chromagram, x_axis='time', y_axis='chroma', hop_length=hop_length, cmap='coolwarm');
Chromogram shape: (12, 133)

EDA ( Exploratory Data Analysis )

EDA is going to be performed on the features_30_sec.csv. This file contains the mean and variance for each audio file fo the features analysed above.

So, the table has a final of 1000 rows (10 genrex x 100 audio files) and 60 features (dimensionalities).

data = pd.read_csv('features_30_sec.csv')
data.head()
filename length chroma_stft_mean chroma_stft_var rms_mean rms_var spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean spectral_bandwidth_var rolloff_mean rolloff_var zero_crossing_rate_mean zero_crossing_rate_var harmony_mean harmony_var perceptr_mean perceptr_var tempo mfcc1_mean mfcc1_var mfcc2_mean mfcc2_var mfcc3_mean mfcc3_var mfcc4_mean mfcc4_var mfcc5_mean mfcc5_var mfcc6_mean mfcc6_var mfcc7_mean mfcc7_var mfcc8_mean mfcc8_var mfcc9_mean mfcc9_var mfcc10_mean mfcc10_var mfcc11_mean mfcc11_var mfcc12_mean mfcc12_var mfcc13_mean mfcc13_var mfcc14_mean mfcc14_var mfcc15_mean mfcc15_var mfcc16_mean mfcc16_var mfcc17_mean mfcc17_var mfcc18_mean mfcc18_var mfcc19_mean mfcc19_var mfcc20_mean mfcc20_var label
0 blues.00000.wav 661794 0.350088 0.088757 0.130228 0.002827 1784.165850 129774.064525 2002.449060 85882.761315 3805.839606 9.015054e+05 0.083045 0.000767 -4.529724e-05 0.008172 0.000008 0.005698 123.046875 -113.570648 2564.207520 121.571793 295.913818 -19.168142 235.574432 42.366421 151.106873 -6.364664 167.934799 18.623499 89.180840 -13.704891 67.660492 15.343150 68.932579 -12.274110 82.204201 10.976572 63.386311 -8.326573 61.773094 8.803792 51.244125 -3.672300 41.217415 5.747995 40.554478 -5.162882 49.775421 0.752740 52.420910 -1.690215 36.524071 -0.408979 41.597103 -2.303523 55.062923 1.221291 46.936035 blues
1 blues.00001.wav 661794 0.340914 0.094980 0.095948 0.002373 1530.176679 375850.073649 2039.036516 213843.755497 3550.522098 2.977893e+06 0.056040 0.001448 1.395807e-04 0.005099 -0.000178 0.003063 67.999589 -207.501694 7764.555176 123.991264 560.259949 8.955127 572.810913 35.877647 264.506104 2.907320 279.932922 21.510466 156.477097 -8.560436 200.849182 23.370686 142.555954 -10.099661 166.108521 11.900497 104.358612 -5.555639 105.173630 5.376327 96.197212 -2.231760 64.914291 4.220140 73.152534 -6.012148 52.422142 0.927998 55.356403 -0.731125 60.314529 0.295073 48.120598 -0.283518 51.106190 0.531217 45.786282 blues
2 blues.00002.wav 661794 0.363637 0.085275 0.175570 0.002746 1552.811865 156467.643368 1747.702312 76254.192257 3042.260232 7.840345e+05 0.076291 0.001007 2.105576e-06 0.016342 -0.000019 0.007458 161.499023 -90.722595 3319.044922 140.446304 508.765045 -29.093889 411.781219 31.684334 144.090317 -13.984504 155.493759 25.764742 74.548401 -13.664875 106.981827 11.639934 106.574875 -11.783643 65.447945 9.718760 67.908859 -13.133803 57.781425 5.791199 64.480209 -8.907628 60.385151 -1.077000 57.711136 -9.229274 36.580986 2.451690 40.598766 -7.729093 47.639427 -1.816407 52.382141 -3.439720 46.639660 -2.231258 30.573025 blues
3 blues.00003.wav 661794 0.404785 0.093999 0.141093 0.006346 1070.106615 184355.942417 1596.412872 166441.494769 2184.745799 1.493194e+06 0.033309 0.000423 4.583644e-07 0.019054 -0.000014 0.002712 63.024009 -199.544205 5507.517090 150.090897 456.505402 5.662678 257.161163 26.859079 158.267303 1.771399 268.034393 14.234031 126.794128 -4.832006 155.912079 9.286494 81.273743 -0.759186 92.114090 8.137607 71.314079 -3.200653 110.236687 6.079319 48.251999 -2.480174 56.799400 -1.079305 62.289902 -2.870789 51.651592 0.780874 44.427753 -3.319597 50.206673 0.636965 37.319130 -0.619121 37.259739 -3.407448 31.949339 blues
4 blues.00004.wav 661794 0.308526 0.087841 0.091529 0.002303 1835.004266 343399.939274 1748.172116 88445.209036 3579.757627 1.572978e+06 0.101461 0.001954 -1.756129e-05 0.004814 -0.000010 0.003094 135.999178 -160.337708 5195.291992 126.219635 853.784729 -35.587811 333.792938 22.148071 193.456100 -32.478600 336.276825 10.852294 134.831573 -23.352329 93.257095 0.498434 124.672127 -11.793437 130.073349 1.207256 99.675575 -13.088418 80.254066 -2.813867 86.430626 -6.933385 89.555443 -7.552725 70.943336 -9.164666 75.793404 -4.520576 86.099236 -5.454034 75.269707 -0.916874 53.613918 -4.404827 62.910812 -11.703234 55.195160 blues

Correlation Heatmap for feature means

spike_cols = [col for col in data.columns if 'mean' in col]
corr = data[spike_cols].corr()

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=np.bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(16, 11));

# Generate a custom diverging colormap
cmap = sns.diverging_palette(0, 25, as_cmap=True, s = 90, l = 45, n = 5)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

plt.title('Correlation Heatmap (for the MEAN variables)', fontsize = 25)
plt.xticks(fontsize = 10)
plt.yticks(fontsize = 10);

Box Plot for Genres Distributions

x = data[["label", "tempo"]]

f, ax = plt.subplots(figsize=(16, 9));
sns.boxplot(x = "label", y = "tempo", data = x, palette = 'husl');

plt.title('BPM Boxplot for Genres', fontsize = 25)
plt.xticks(fontsize = 14)
plt.yticks(fontsize = 10);
plt.xlabel("Genre", fontsize = 15)
plt.ylabel("BPM", fontsize = 15)
Text(0, 0.5, 'BPM')

Principal Component Analysis - to visualize possible groups of genres

1) Normalization

2) PCA

3) The Scatter Plot

data = data.iloc[0:, 1:]
y = data['label']
X = data.loc[:, data.columns != 'label']

#### NORMALIZE X ####
cols = X.columns
min_max_scaler = preprocessing.MinMaxScaler()
np_scaled = min_max_scaler.fit_transform(X)
X = pd.DataFrame(np_scaled, columns = cols)


#### PCA 2 COMPONENTS ####
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data = principalComponents, columns = ['principal component 1', 'principal component 2'])

# concatenate with target label
finalDf = pd.concat([principalDf, y], axis = 1)

pca.explained_variance_ratio_

# 44.93 variance explained
array([0.24644968, 0.22028192])
plt.figure(figsize = (16, 9))
sns.scatterplot(x = "principal component 1", y = "principal component 2", data = finalDf, hue = "label", alpha = 0.7,
               s = 100);

plt.title('PCA on Genres', fontsize = 25)
plt.xticks(fontsize = 14)
plt.yticks(fontsize = 10);
plt.xlabel("Principal Component 1", fontsize = 15)
plt.ylabel("Principal Component 2", fontsize = 15)
plt.savefig("PCA Scattert.jpg")

Recomender System

"Recomender" Systems enable us for any given vector to find the best similarity, ranked in descending order, from the bast match to the least best match.

For Audio files, this will be done through cosine_similarity library.

data = pd.read_csv('features_30_sec.csv', index_col='filename')

# Extract labels
labels = data[['label']]

# Drop labels from original dataframe
data = data.drop(columns=['length','label'])
data.head()

# Scale the data
data_scaled=preprocessing.scale(data)
print('Scaled data type:', type(data_scaled))
Scaled data type: <class 'numpy.ndarray'>

Cosine similarity

Calculates the pairwise cosine similarity for each combination of songs in the data. This results in a 1000 x 1000 matrix (with redundancy in the information as item A similarity to item B == item B similarity to item A).

similarity = cosine_similarity(data_scaled)
print("Similarity shape:", similarity.shape)

# Convert into a dataframe and then set the row index and column names as labels
sim_df_labels = pd.DataFrame(similarity)
sim_df_names = sim_df_labels.set_index(labels.index)
sim_df_names.columns = labels.index

sim_df_names.head()
Similarity shape: (1000, 1000)
filename blues.00000.wav blues.00001.wav blues.00002.wav blues.00003.wav blues.00004.wav blues.00005.wav blues.00006.wav blues.00007.wav blues.00008.wav blues.00009.wav blues.00010.wav blues.00011.wav blues.00012.wav blues.00013.wav blues.00014.wav blues.00015.wav blues.00016.wav blues.00017.wav blues.00018.wav blues.00019.wav blues.00020.wav blues.00021.wav blues.00022.wav blues.00023.wav blues.00024.wav blues.00025.wav blues.00026.wav blues.00027.wav blues.00028.wav blues.00029.wav blues.00030.wav blues.00031.wav blues.00032.wav blues.00033.wav blues.00034.wav blues.00035.wav blues.00036.wav blues.00037.wav blues.00038.wav blues.00039.wav ... rock.00060.wav rock.00061.wav rock.00062.wav rock.00063.wav rock.00064.wav rock.00065.wav rock.00066.wav rock.00067.wav rock.00068.wav rock.00069.wav rock.00070.wav rock.00071.wav rock.00072.wav rock.00073.wav rock.00074.wav rock.00075.wav rock.00076.wav rock.00077.wav rock.00078.wav rock.00079.wav rock.00080.wav rock.00081.wav rock.00082.wav rock.00083.wav rock.00084.wav rock.00085.wav rock.00086.wav rock.00087.wav rock.00088.wav rock.00089.wav rock.00090.wav rock.00091.wav rock.00092.wav rock.00093.wav rock.00094.wav rock.00095.wav rock.00096.wav rock.00097.wav rock.00098.wav rock.00099.wav
filename
blues.00000.wav 1.000000 0.049231 0.589618 0.284862 0.025561 -0.346688 -0.219483 -0.167626 0.641877 -0.097889 -0.004725 -0.138701 0.022420 0.170770 -0.100287 -0.061984 0.038424 0.070591 -0.043250 -0.032994 0.506165 -0.103861 -0.153357 -0.021814 -0.068566 -0.103846 -0.286219 -0.017917 0.019182 -0.245793 -0.158164 -0.203505 -0.238364 -0.245459 -0.182820 0.016867 -0.108189 -0.377991 -0.102337 -0.288664 ... -0.180951 -0.357947 -0.499987 -0.583565 0.582948 0.455372 -0.117475 0.291773 0.535460 0.111680 -0.018038 0.616646 0.616184 0.619470 0.737537 0.391488 0.386102 0.524411 0.383944 0.635159 0.483003 0.338310 0.341182 0.704877 0.468150 -0.165756 -0.033461 0.655153 0.271846 0.484170 -0.082829 0.546169 0.578558 0.662590 0.571629 0.610942 0.640835 0.496294 0.284958 0.304098
blues.00001.wav 0.049231 1.000000 -0.096834 0.520903 0.080749 0.307856 0.318286 0.415258 0.120649 0.404168 0.187969 0.537564 0.116593 0.138999 0.372891 0.339293 0.243391 0.293110 0.242766 0.350323 0.173128 0.374270 0.433072 0.041210 0.293562 0.208121 0.265333 0.069686 0.193576 0.315657 0.197519 0.174806 0.336295 0.264876 0.151645 0.155828 0.129249 0.296153 0.236439 0.296356 ... -0.292708 0.038637 -0.141359 0.418910 -0.177155 -0.108917 0.554469 0.031773 -0.153006 0.026416 -0.086265 -0.040695 -0.088279 -0.219520 -0.169743 0.043461 0.234076 -0.143892 0.226681 -0.234705 -0.254766 0.087614 -0.204128 -0.241611 -0.207278 0.031050 0.220761 -0.295174 -0.133969 0.047796 -0.098111 -0.325126 -0.370792 -0.191698 -0.330834 -0.077301 -0.222119 -0.302573 0.499562 0.311723
blues.00002.wav 0.589618 -0.096834 1.000000 0.210411 0.400266 -0.082019 -0.028061 0.104446 0.468113 -0.132532 0.220436 0.057667 0.214217 0.026326 -0.174039 -0.091717 0.042537 0.056512 -0.294934 0.037617 0.287231 -0.181328 -0.235117 0.129427 -0.020690 -0.191341 -0.372636 -0.058686 0.120372 0.075361 0.216751 0.196171 0.082313 0.175337 0.256547 0.392760 0.325772 -0.130494 0.242670 0.045100 ... -0.386603 -0.475932 -0.462687 -0.531435 0.657971 0.434122 -0.034768 0.271837 0.451977 0.175507 0.234224 0.450491 0.417995 0.488881 0.503598 0.348458 0.357615 0.455360 0.404612 0.631893 0.443831 0.326861 0.241868 0.550154 0.420591 0.109318 0.109191 0.594156 0.192450 0.566140 -0.032408 0.561074 0.590779 0.583293 0.514537 0.495707 0.566837 0.589983 0.216378 0.321069
blues.00003.wav 0.284862 0.520903 0.210411 1.000000 0.126437 0.134796 0.300746 0.324566 0.352758 0.295184 0.339783 0.414037 0.103369 0.220996 0.224956 0.228406 0.042363 0.261825 0.065738 0.303616 0.225934 0.248292 0.260467 0.101586 0.330638 0.255722 0.167251 0.160013 0.378844 0.163240 0.115658 0.078846 0.199018 0.076584 0.028351 0.183075 0.142251 0.039708 0.179125 0.209520 ... -0.104328 0.032359 -0.210963 0.119993 -0.109893 -0.073297 0.051193 -0.159978 -0.190545 -0.384509 -0.314229 0.010499 0.061320 -0.050735 0.042413 -0.168371 0.291878 -0.034075 0.213593 -0.047289 -0.232775 -0.261782 -0.300883 -0.029997 -0.253542 -0.343440 -0.280107 -0.113291 -0.274467 0.285601 -0.320107 -0.206516 -0.151132 0.041986 -0.172515 -0.000287 0.020515 -0.107821 0.502279 0.183210
blues.00004.wav 0.025561 0.080749 0.400266 0.126437 1.000000 0.556066 0.482195 0.623455 0.029703 0.471657 0.425722 0.440986 0.375045 0.112140 -0.042368 0.160624 0.103306 0.137831 -0.088361 0.321315 0.019133 -0.042296 0.035282 0.138877 0.058387 -0.062569 -0.134307 0.079337 0.127336 0.419068 0.558278 0.478488 0.543509 0.524158 0.520597 0.594418 0.491166 0.387443 0.573986 0.506192 ... -0.245340 -0.182963 -0.079161 -0.017619 0.304630 0.175296 0.380586 0.099357 0.021247 0.326572 0.481840 -0.236271 -0.207946 -0.151639 -0.163956 -0.018372 0.075316 -0.122199 0.065956 0.126620 0.057758 0.171239 -0.041117 -0.007086 -0.006466 0.477521 0.256614 -0.076402 0.105612 0.368627 0.087605 0.017366 0.138035 0.104684 -0.034594 0.063454 0.063546 0.172944 0.153192 0.061785

5 rows × 1000 columns

Song similarity scoring

find_similar_songs() - is a predefined function that takes the name of the song and returns top 5 best matches for that song.

def find_similar_songs(name):
    # Find songs most similar to another song
    series = sim_df_names[name].sort_values(ascending = False)
    
    # Remove cosine similarity == 1 (songs will always have the best match with themselves)
    series = series.drop(name)
    
    # Display the 5 top matches 
    print("\n*******\nSimilar songs to ", name)
    print(series.head(5))

Putting the Similarity Function into Action

Rock Example

find_similar_songs('rock.00067.wav') 

ipd.Audio(f'{df}/genres/rock/rock.00067.wav')
*******
Similar songs to  rock.00067.wav
filename
rock.00068.wav     0.837088
rock.00065.wav     0.782433
metal.00065.wav    0.772132
metal.00044.wav    0.772132
metal.00041.wav    0.765587
Name: rock.00067.wav, dtype: float64

Similar song match no.1

ipd.Audio(f'{df}/genres/rock/rock.00068.wav')

Similar song match no.2

ipd.Audio(f'{df}/genres/rock/rock.00065.wav')

Similar song match no.3

ipd.Audio(f'{df}/genres/metal/metal.00065.wav')

Similar song match no.4

ipd.Audio(f'{df}/genres/metal/metal.00044.wav')

Similar song match no.5

ipd.Audio(f'{df}/genres/metal/metal.00041.wav')